NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DifferentialRegulation : a Bayesian hierarchical approach to identify differentially regulated genes

https://doi.org/10.1093/biostatistics/kxae017

Tiberi, Simone; Meili, Joël; Cai, Peiying; Soneson, Charlotte; He, Dongze; Sarkar, Hirak; Avalos-Pacheco, Alejandra; Patro, Rob; Robinson, Mark D (June 2024, Biostatistics)

Summary Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.
more » « less
Full Text Available
A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

https://doi.org/10.26508/lsa.201800175

Soneson, Charlotte; Love, Michael I; Patro, Rob; Hussain, Shobbir; Malhotra, Dheeraj; Robinson, Mark D (January 2019, Life Science Alliance)

Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.
more » « less
Full Text Available
RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

https://doi.org/10.1146/annurev-biodatasci-072018-021255

Van den Berge, Koen; Hembach, Katharina M.; Soneson, Charlotte; Tiberi, Simone; Clement, Lieven; Love, Michael I.; Patro, Rob; Robinson, Mark D. (July 2019, Annual Review of Biomedical Data Science)

Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
more » « less
Full Text Available

Search for: All records